There are several R packages for time series. We will use “astsa” developed by Stoffer; “fpp2” developed by Hyndman that contains the data for the examples in Forecasting: Principles and Practice, 2nd ed
Install the packages
install.packages(c("fpp2","forecast","astsa"),repos=c(CRAN = "http://cran.rstudio.com"))
##
## The downloaded binary packages are in
## /var/folders/rb/4fr3xbsj4zqg9xl8cxfzdj4w0000gn/T//RtmpMNruGE/downloaded_packages
Load the packages
library(astsa)
library(fpp2)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## ── Attaching packages ────────────────────────────────────────────── fpp2 2.4 ──
## ✔ ggplot2 3.3.6 ✔ fma 2.4
## ✔ forecast 8.16 ✔ expsmooth 2.3
##
##
## Attaching package: 'fpp2'
## The following object is masked from 'package:astsa':
##
## oil
library(forecast)
“tute1.csv” contains quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation. ### Read the data into R with the following script:
tute1 <- read.csv("tute1.csv", header=TRUE)
head(tute1)
## X Sales AdBudget GDP
## 1 Mar-81 1020.2 659.2 251.8
## 2 Jun-81 889.2 589.0 290.9
## 3 Sep-81 795.0 512.5 290.8
## 4 Dec-81 1003.9 614.1 292.4
## 5 Mar-82 1057.7 647.2 279.1
## 6 Jun-82 944.4 602.0 254.0
mytimeseries <- ts(tute1[,-1], start=1981, frequency=4)
#(The [,-1] removes the first column which contains the quarters as we don’t need them now.)
autoplot(mytimeseries, facets=TRUE)
“retail.xls” contains monthly Australian retail data. These represent retail sales in various categories for different Australian states. ++ Read the data
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
#The second argument (skip=1) is required because the Excel sheet has two header rows.
++ Select one of the time series as follows (but replace the column name with your own chosen column)
myts <- ts(retaildata[,"A3349873A"],
frequency=12, start=c(1982,4))
“milk.txt” contains the monthly milk production per cow from January 1962 to December 1975.
milk = ts(scan("milk.txt"), start=c(1962,1), frequency=12)
I am going to illustrate with the CARDOX time series.
data(cardox,package="astsa")
tsplot(cardox, ylab="ppm", main="Monthly Carbon Dioxide Levels at Mauna Loa, March 1958 to November 2018")
tsplot(window(cardox,start=2000,end=2010), ylab="ppm", main="Monthly Carbon Dioxide Levels, Jan 2000 to Dec 2010",type="b")
ggmonthplot(cardox)+
ylab("ppm") +
ggtitle("Seasonal plot: Cabon dioxine ")
The blue lines represent the mean of the corresponding month. Be careful because this mean has no meaning since the data presents trend.
ggseasonplot(cardox)
lag1.plot(cardox,12)
# Stl decomposition The Seasonal Decomposition of Time Series by Loess
is implemented in R in the stl() function and decomposes a time series
into seasonal, trend and irregular components using loess. The seasonal
component is found by loess smoothing the seasonal sub-series (the
series of all January values, …); if s.window = “periodic” smoothing is
effectively replaced by taking the mean. The seasonal values are
removed, and the remainder smoothed to find the trend. The overall level
is removed from the seasonal component and added to the trend component.
This process is iterated a few times. The remainder component is the
residuals from the seasonal plus trend fit.
cardox.stlper=stl(cardox, s.window="periodic")
cardox.stl=stl(cardox, s.window=13)
plot(cardox.stlper)
plot(cardox.stl)
Justo to inspect/retrieve the components
cardox.stlper$time.series[1:24,]
## seasonal trend remainder
## [1,] 1.40390911 314.8115 -0.50536160
## [2,] 2.55124311 314.9140 -0.01528735
## [3,] 2.98644602 315.0166 -0.50308202
## [4,] 2.29771005 315.1059 -0.30364134
## [5,] 0.69290846 315.1952 -0.02813505
## [6,] -1.43851114 315.2765 1.09200191
## [7,] -3.12173405 315.3578 0.96394217
## [8,] -3.21416275 315.4404 0.43379325
## [9,] -2.01642760 315.5229 -0.17651950
## [10,] -0.86647303 315.5693 -0.03285278
## [11,] 0.05509929 315.6157 -0.05080380
## [12,] 0.66999251 315.6433 0.06669096
## [13,] 1.40390911 315.6709 -0.36483765
## [14,] 2.55124311 315.7391 -0.57030318
## [15,] 2.98644602 315.8072 -0.50363762
## [16,] 2.29771005 315.9036 -0.05128026
## [17,] 0.69290846 315.9999 -0.15285728
## [18,] -1.43851114 316.0910 0.14750814
## [19,] -3.12173405 316.1821 0.77967687
## [20,] -3.21416275 316.2791 0.19504742
## [21,] -2.01642760 316.3762 0.44025413
## [22,] -0.86647303 316.4748 -0.02831731
## [23,] 0.05509929 316.5734 -0.19850649
## [24,] 0.66999251 316.6538 -0.35380621
cardox.stl$time.series[1:24,]
## seasonal trend remainder
## [1,] 1.06191289 314.9247 -0.276583676
## [2,] 2.25652460 314.9998 0.193638461
## [3,] 2.76435007 315.0750 -0.339353158
## [4,] 2.23807443 315.1502 -0.288243663
## [5,] 0.86951355 315.2190 -0.228509917
## [6,] -1.04974120 315.2878 0.691917697
## [7,] -2.65471321 315.3567 0.498062579
## [8,] -3.01690560 315.4248 0.252076479
## [9,] -1.93577654 315.4930 -0.227231067
## [10,] -0.96139298 315.5612 0.070206880
## [11,] -0.08083053 315.6158 0.085064580
## [12,] 0.50419885 315.6703 0.205455350
## [13,] 1.06933903 315.7249 -0.084264670
## [14,] 2.26280905 315.7889 -0.331661444
## [15,] 2.76483766 315.8528 -0.327616799
## [16,] 2.23740589 315.9167 -0.004111773
## [17,] 0.87238040 315.9965 -0.328910199
## [18,] -1.05771182 316.0764 -0.218641895
## [19,] -2.66319892 316.1562 0.347021279
## [20,] -3.02362377 316.2616 0.022073042
## [21,] -1.93428718 316.3669 0.367363348
## [22,] -0.96092998 316.4723 0.068633056
## [23,] -0.07844260 316.5639 -0.055446190
## [24,] 0.50714635 316.6555 -0.192627005
To investigate the correlation behaviour of the remainder. ++ with lag plots
lag1.plot(cardox.stlper$time.series[,3],12)
lag1.plot(cardox.stl$time.series[,3],12)
acf(cardox)
acf(cardox.stlper$time.series[,3])
acf2(cardox.stlper$time.series[,3], max.lag = 30)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## ACF 0.36 0.07 -0.18 -0.29 -0.27 -0.26 -0.21 -0.13 0.01 0.09 0.21 0.27
## PACF 0.36 -0.07 -0.21 -0.18 -0.12 -0.19 -0.19 -0.16 -0.10 -0.12 0.01 0.05
## [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF 0.21 0.1 -0.05 -0.16 -0.21 -0.19 -0.19 -0.10 -0.01 0.13 0.22 0.24
## PACF 0.02 0.0 -0.04 -0.04 -0.06 -0.05 -0.13 -0.09 -0.10 -0.02 -0.01 0.00
## [,25] [,26] [,27] [,28] [,29] [,30]
## ACF 0.20 0.04 -0.05 -0.16 -0.17 -0.19
## PACF 0.02 -0.07 -0.01 -0.06 -0.01 -0.09
The correlation in the series remainder is much less than in the original time series. Most of the trend and seasonality has been removed but there is still correlation in the data that must be accounted for.
The difference operator, \(\nabla= 1-B\) where \(B\) is the lag operator \(B x_t=x_{t-1}\). So \[\nabla x_t = (1-B ) x_t =x_t -B x_{t} = x_t- x_{t-1}\]
The resulting time series \(y_t=\nabla x_t\) represents the increments or change of \(x\) on consecutive time points. Take the price of chicken, \(x_t\). Then \(y_t = x_t- x_{t-1}\) represents the monthly increase of price: from January to February, February to March, etc.
The seasonal difference operator is defined as \(\nabla^S=1-B^S\), where \(S\) is the seasonality. Remember that \(B^S x_t= x_{t-S}\) so \(\nabla^S x_t= x_t - x_{t-S}\) and representes the increments over the seasonal period, usually annual increments.
par(mfrow=c(2,1))
tsplot(cardox, ylab="ppm", col=4, lwd=2)
tsplot(diff(cardox), ylab="ppm", col=4, lwd=2, main="Monthly increments of carbom dioxide")
acf2(diff(cardox))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## ACF 0.7 0.24 -0.21 -0.46 -0.51 -0.5 -0.51 -0.45 -0.19 0.24 0.71 0.92
## PACF 0.7 -0.48 -0.31 -0.06 -0.12 -0.4 -0.54 -0.51 -0.35 -0.22 0.18 0.39
## [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF 0.71 0.24 -0.20 -0.45 -0.51 -0.49 -0.50 -0.44 -0.20 0.25 0.70 0.91
## PACF 0.17 -0.01 0.02 0.04 0.03 0.11 0.03 -0.02 -0.07 -0.05 0.02 0.12
## [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36]
## ACF 0.70 0.23 -0.20 -0.45 -0.50 -0.49 -0.49 -0.43 -0.19 0.25 0.69 0.89
## PACF 0.05 -0.11 0.03 0.00 0.03 0.02 0.02 0.08 0.03 0.00 0.05 0.05
## [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF 0.68 0.22 -0.20 -0.44 -0.49 -0.48 -0.48 -0.42 -0.18 0.24 0.69 0.87
## PACF 0.02 -0.06 0.01 0.04 0.05 -0.01 0.02 0.04 0.00 -0.05 0.06 0.03
tsplot(diff(cardox,12), ylab="ppm", col=4, lwd=2, main="Annual increments of carbon dioxide")
acf2(diff(cardox,12))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## ACF 0.83 0.77 0.71 0.68 0.65 0.60 0.56 0.52 0.48 0.40 0.35 0.24 0.29
## PACF 0.83 0.25 0.06 0.09 0.08 -0.04 -0.02 -0.01 -0.02 -0.15 -0.04 -0.24 0.36
## [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25]
## ACF 0.30 0.29 0.28 0.26 0.26 0.25 0.25 0.25 0.26 0.26 0.27 0.28
## PACF 0.15 -0.01 0.05 0.02 0.03 -0.03 0.01 0.00 -0.03 -0.02 -0.09 0.24
## [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37]
## ACF 0.29 0.32 0.32 0.33 0.33 0.33 0.35 0.35 0.36 0.35 0.35 0.35
## PACF 0.11 0.06 0.00 0.01 -0.01 -0.04 0.05 0.02 -0.01 -0.06 -0.07 0.16
## [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF 0.33 0.32 0.32 0.32 0.32 0.33 0.33 0.33 0.33 0.34 0.33
## PACF 0.00 0.04 0.02 0.05 0.00 0.02 0.11 -0.01 0.02 -0.01 -0.09
tsplot(diff(diff(cardox),12), ylab="ppm", col=4, lwd=2, main="Detrended and deseasonalized carbon dioxide")
acf2(diff(diff(cardox),12))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## ACF -0.32 0.01 -0.08 -0.02 0.06 -0.02 0.00 -0.01 0.11 -0.07 0.16 -0.46
## PACF -0.32 -0.11 -0.12 -0.10 0.01 -0.01 -0.01 -0.01 0.12 0.01 0.18 -0.40
## [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24]
## ACF 0.12 0.06 -0.02 0.04 -0.08 0.05 -0.04 0.00 -0.06 0.03 -0.01 -0.02
## PACF -0.17 -0.02 -0.07 -0.04 -0.04 0.02 -0.01 -0.03 0.01 -0.01 0.07 -0.27
## [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36]
## ACF 0.02 -0.07 0.08 -0.03 0.04 -0.02 -0.03 0.03 0.00 0.05 -0.02 0.02
## PACF -0.14 -0.09 -0.02 -0.02 -0.01 0.04 -0.05 -0.03 -0.01 0.04 0.06 -0.18
## [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44] [,45] [,46] [,47] [,48]
## ACF 0.03 0.00 -0.05 -0.01 0.02 -0.02 0.01 0.02 -0.02 -0.02 0.05 -0.03
## PACF -0.02 -0.05 -0.04 -0.06 -0.02 -0.02 -0.12 0.00 -0.05 0.00 0.09 -0.12
The twice differenced series does not present trend or seasonality but there is still correlation in the data that must be accounted for.